Machine Learning Final Project: Handwritten Sanskrit Recognition using a Multi-class SVM with K-NN Guidance

نویسندگان

  • Yichang Shih
  • Donglai Wei
چکیده

We develop an optical character recognition (OCR) engine for handwritten Sanskrit using a two-stage classifier. Inside the standard OCR pipeline, we focus on the classification problem assuming characters have been preprocessed decently. One challenge we face is that the language of Sanskrit has about a hundred core characters where model driven methods, like Support Vector Machine (SVM), have to search in the exponentially growth of the combinatoric model space during training, while data driven methods, like k nearest neighbor (kNN), becomes costly in computation during testing. To address this challenge, we propose a two-stage multiclassifier, using non-parametric to reduce the model space to search, and parametric models to relieve computation burden with better generalization. In the first stage, we apply kNN to coarsely assign the test data into the possible group of k classes, and a multiclassifier of k classes to label the sample in the second stage. Our method is fully automatic, highly accurate, and computational efficiently.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of SVM Models for Learning Handwritten Arabic Characters

In order to select the best SVM model for a specific machine learning task, a comparative study of SVM models is presented in this paper. We investigate the case of learning handwritten Arabic characters and we make use of tabu search metaheuristic in order to scan a large space of SVM models including multi-class scheme (one-against-one or one-against-all), SVM kernel function and kernel param...

متن کامل

Handwritten digit Recognition using Support Vector Machine

Handwritten Numeral recognition plays a vital role in postal automation services especially in countries like India where multiple languages and scripts are used Discrete Hidden Markov Model (HMM) and hybrid of Neural Network (NN) and HMM are popular methods in handwritten word recognition system. The hybrid system gives better recognition result due to better discrimination capability of the N...

متن کامل

Fourier Descriptor based Isolated Marathi Handwritten Numeral Recognition

Numeral recognition remains one of the most important problems in pattern recognition. To the best of our knowledge, little work has been done in Devnagari script compared with those for non Indian scripts like Latin, Chinese and Japanese. In this paper we propose an effective method for recognition of isolated Marathi handwritten numerals written in Devnagari script. Fourier Descriptors that d...

متن کامل

Handwritten Devanagari Word Recognition: A Curvelet Transform Based Approach

Abstract— This paper presents a new offline handwritten Devanagari word recognition system. Though Devanagari is the script for Hindi, which is the official language of India, its character and word recognition pose great challenges due to large variety of symbols and their proximity in appearance. In order to extract features which can distinguish similar appearing words, we employ Curvelet Tr...

متن کامل

Online Handwritten Digit Recognition Using Gaussian Based Classifier

Discrete Hidden Markov Model (HMM) and hybrid of Neural Network (NN) and HMM are popular methods in handwritten word recognition system. The hybrid system gives better recognition result due to better discrimination capability of the NN. A major problem in handwriting recognition is the huge variability and distortions of patterns. Elastic models based on local observations and dynamic programm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011